Existing learning-based multi-view stereo (MVS) methods rely on the depth range to build the 3D cost volume and may fail when the range is too large or unreliable. To address this problem, we propose a disparity-based MVS method based on the epipolar disparity flow (E-flow), called DispMVS, which infers the depth information from the pixel movement between two views. The core of DispMVS is to construct a 2D cost volume on the image plane along the epipolar line between each pair (between the reference image and several source images) for pixel matching and fuse uncountable depths triangulated from each pair by multi-view geometry to ensure multi-view consistency. To be robust, DispMVS starts from a randomly initialized depth map and iteratively refines the depth map with the help of the coarse-to-fine strategy. Experiments on DTUMVS and Tanks\&Temple datasets show that DispMVS is not sensitive to the depth range and achieves state-of-the-art results with lower GPU memory.
translated by 谷歌翻译
全景图像可以同时展示周围环境的完整信息,并且在虚拟旅游,游戏,机器人技术等方面具有许多优势。但是,全景深度估计的进度无法完全解决由常用的投射方法引起的失真和不连续性问题。本文提出了SphereDepth,这是一种新型的全景深度估计方法,该方法可直接预测球形网格的深度而无需投影预处理。核心思想是建立全景图像与球形网格之间的关系,然后使用深层神经网络在球形域上提取特征以预测深度。为了解决高分辨率全景数据带来的效率挑战,我们介绍了两个超参数,以平衡推理速度和准确性。在三个公共全景数据集中验证,SphereDepth通过全景深度估算的最新方法实现了可比的结果。从球形域设置中受益,球形部可以产生高质量的点云,并显着缓解失真和不连续性问题。
translated by 谷歌翻译
最近,人重新识别(REID)的隐私问题引起了越来越多的关注,并保留了REID方法使用的行人图像的隐私是必不可少的。去识别(DEID)方法通过删除与REID数据相关的身份来减轻隐私问题。但是,大多数现有的DEID方法倾向于删除所有与个人身份相关的信息,并损害REID任务上的识别数据的可用性。在本文中,我们旨在开发一种可以在REID人士的隐私保护和数据可用性之间实现良好权衡的技术。为了实现这一目标,我们提出了一种新颖的去识别方法,该方法是针对人雷德(Reid)明确设计的,命名人识别转移(PIS)。 PI在保留图像对之间的身份关系的同时,消除了行人图像中的绝对身份。通过利用变异自动编码器的插值属性,PI将每个行人图像从当前身份转移到具有新身份的另一个身份,从而导致图像仍然保留相对身份。实验结果表明,与现有的去识别方法相比,我们的方法在隐私保护和模型性能之间取决于更好的权衡,并且可以防御人类和模型攻击以确保数据隐私。
translated by 谷歌翻译
How to improve discriminative feature learning is central in classification. Existing works address this problem by explicitly increasing inter-class separability and intra-class similarity, whether by constructing positive and negative pairs for contrastive learning or posing tighter class separating margins. These methods do not exploit the similarity between different classes as they adhere to i.i.d. assumption in data. In this paper, we embrace the real-world data distribution setting that some classes share semantic overlaps due to their similar appearances or concepts. Regarding this hypothesis, we propose a novel regularization to improve discriminative learning. We first calibrate the estimated highest likelihood of one sample based on its semantically neighboring classes, then encourage the overall likelihood predictions to be deterministic by imposing an adaptive exponential penalty. As the gradient of the proposed method is roughly proportional to the uncertainty of the predicted likelihoods, we name it adaptive discriminative regularization (ADR), trained along with a standard cross entropy loss in classification. Extensive experiments demonstrate that it can yield consistent and non-trivial performance improvements in a variety of visual classification tasks (over 10 benchmarks). Furthermore, we find it is robust to long-tailed and noisy label data distribution. Its flexible design enables its compatibility with mainstream classification architectures and losses.
translated by 谷歌翻译
Zigzag flattening (ZF) is commonly utilized as a default option to get the image patches ordering in deep models, e.g. vision transformers (ViTs). Notably, when decomposing multi-scale images, ZF could not maintain the invariance of feature point positions.To this end, we investigate the Hilbert flattening (HF) as an alternative for sequence ordering in vision tasks. HF has proven to be superior to other flatten approaches in maintaining spatial locality, when performing multi-scale transformations of dimensional space. In applications, we design a position encoding method based on HF, beating absolute position encoding non-trivially in Transformer architecture. It also can be used to feature down-sampling and feature/image interpolation. Extensive experiments demonstrate that it can yield consistent performance boosts for several popular architectures and applications. The code will be released upon acceptance.
translated by 谷歌翻译
Federated Deep Learning frameworks can be used strategically to monitor Land Use locally and infer environmental impacts globally. Distributed data from across the world would be needed to build a global model for Land Use classification. The need for a Federated approach in this application domain would be to avoid transfer of data from distributed locations and save network bandwidth to reduce communication cost. We use a Federated UNet model for Semantic Segmentation of satellite and street view images. The novelty of the proposed architecture is the integration of Knowledge Distillation to reduce communication cost and response time. The accuracy obtained was above 95% and we also brought in a significant model compression to over 17 times and 62 times for street View and satellite images respectively. Our proposed framework has the potential to be a game-changer in real-time tracking of climate change across the planet.
translated by 谷歌翻译
Air pollution is a crucial issue affecting human health and livelihoods, as well as one of the barriers to economic and social growth. Forecasting air quality has become an increasingly important endeavor with significant social impacts, especially in emerging countries like China. In this paper, we present a novel Transformer architecture termed AirFormer to collectively predict nationwide air quality in China, with an unprecedented fine spatial granularity covering thousands of locations. AirFormer decouples the learning process into two stages -- 1) a bottom-up deterministic stage that contains two new types of self-attention mechanisms to efficiently learn spatio-temporal representations; 2) a top-down stochastic stage with latent variables to capture the intrinsic uncertainty of air quality data. We evaluate AirFormer with 4-year data from 1,085 stations in the Chinese Mainland. Compared to the state-of-the-art model, AirFormer reduces prediction errors by 5%~8% on 72-hour future predictions. Our source code is available at https://github.com/yoshall/airformer.
translated by 谷歌翻译
公平性是一个标准,重点是评估不同人口组的算法性能,它引起了自然语言处理,推荐系统和面部识别的关注。由于医学图像样本中有很多人口统计学属性,因此了解公平的概念,熟悉不公平的缓解技术,评估算法的公平程度并认识到医疗图像分析(媒体)中的公平问题中的挑战很重要。在本文中,我们首先给出了公平性的全面和精确的定义,然后通过在媒体中引入当前使用的技术中使用的技术。之后,我们列出了包含人口统计属性的公共医疗图像数据集,以促进公平研究并总结有关媒体公平性的当前算法。为了帮助更好地理解公平性,并引起人们对媒体中与公平性有关的问题的关注,进行了实验,比较公平性和数据失衡之间的差异,验证各种媒体任务中不公平的存在,尤其是在分类,细分和检测以及评估不公平缓解算法的有效性。最后,我们以媒体公平性的机会和挑战得出结论。
translated by 谷歌翻译
各种深度学习模型,尤其是一些最新的基于变压器的方法,已大大改善了长期时间序列预测的最新性能。但是,这些基于变压器的模型遭受了严重的恶化性能,并延长了输入长度除了使用扩展的历史信息。此外,这些方法倾向于在长期预测中处理复杂的示例,并增加模型复杂性,这通常会导致计算的显着增加和性能较低的鲁棒性(例如,过度拟合)。我们提出了一种新型的神经网络架构,称为Treedrnet,以进行更有效的长期预测。受稳健回归的启发,我们引入了双重残差链接结构,以使预测更加稳健。对Kolmogorov-Arnold表示定理进行了明确的介绍,并明确介绍了特征选择,模型集合和树结构,以进一步利用扩展输入序列,从而提高了可靠的输入序列和Treedrnet的代表力。与以前的顺序预测工作的深层模型不同,Treedrnet完全建立在多层感知下,因此具有很高的计算效率。我们广泛的实证研究表明,Treedrnet比最先进的方法更有效,将预测错误降低了20%至40%。特别是,Treedrnet的效率比基于变压器的方法高10倍。该代码将很快发布。
translated by 谷歌翻译
最近的研究表明,诸如RNN和Transformers之类的深度学习模型为长期预测时间序列带来了显着的性能增长,因为它们有效地利用了历史信息。但是,我们发现,如何在神经网络中保存历史信息,同时避免过度适应历史上的噪音,这仍然有很大的改进空间。解决此问题可以更好地利用深度学习模型的功能。为此,我们设计了一个\ textbf {f}要求\ textbf {i} mpraved \ textbf {l} egendre \ textbf {m} emory模型,或{\ bf film}:它应用了legendre promotions topimate legendre provientions近似历史信息,近似历史信息,使用傅立叶投影来消除噪声,并添加低级近似值以加快计算。我们的实证研究表明,所提出的膜显着提高了由(\ textbf {20.3 \%},\ textbf {22.6 \%})的多变量和单变量长期预测中最新模型的准确性。我们还证明,这项工作中开发的表示模块可以用作一般插件,以提高其他深度学习模块的长期预测性能。代码可从https://github.com/tianzhou2011/film/获得。
translated by 谷歌翻译